Titanic Dataset

In this project, we will be working with the titanic dataset. A set of data manipulation and visualization techniques will be used.


In [1]:
import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
sns.set_style('whitegrid')

In [3]:
titanic = sns.load_dataset('titanic')

In [4]:
titanic.head()


Out[4]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

Jointplot comparing fare and age


In [6]:
sns.jointplot(x='fare',y='age',data=titanic)


Out[6]:
<seaborn.axisgrid.JointGrid at 0xae5efd0>

Plot the fare column as distribution


In [8]:
sns.distplot(titanic['fare'],bins=30,kde=False,color='red')


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0xbc0e828>

Displaying passenger and age over a boxplot


In [10]:
sns.boxplot(x='class',y='age',data=titanic,palette='rainbow')


Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0xbd0ef98>

In [12]:
sns.swarmplot(x='class',y='age',data=titanic,palette='Set2')


Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0xbdf7cf8>

A simple count plot displying the number of passanger by sex


In [14]:
sns.countplot(x='sex',data=titanic)


Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0xbe1a5f8>

A heatmap showing the correlations for the entire dataset


In [16]:
sns.heatmap(titanic.corr(),cmap='coolwarm')
plt.title('titanic.corr()')


Out[16]:
<matplotlib.text.Text at 0xc0f7ac8>

Two histograms ploted using FacetGrid based on age and sex


In [18]:
g = sns.FacetGrid(data=titanic,col='sex')
g.map(plt.hist,'age')


Out[18]:
<seaborn.axisgrid.FacetGrid at 0xc151860>